[perf] refactor: optimizes DataProtoFuture to use fractional lazy fetching of futures#6234
Open
yurun00 wants to merge 1 commit intoverl-project:mainfrom
Open
[perf] refactor: optimizes DataProtoFuture to use fractional lazy fetching of futures#6234yurun00 wants to merge 1 commit intoverl-project:mainfrom
yurun00 wants to merge 1 commit intoverl-project:mainfrom
Conversation
Contributor
There was a problem hiding this comment.
Code Review
This pull request refactors the DataProtoFuture class in verl/protocol.py to use fractional offsets (start_fraction and end_fraction) for data chunking, replacing the previous collect_fn and dispatch_fn mechanism. The changes include a more precise chunk implementation using the fractions module and an updated get method that slices data based on these fractions. Additionally, the PR includes minor cleanups to np.reshape calls and introduces comprehensive unit tests for DataProtoFuture in tests/test_protocol_on_cpu.py. A potential precision issue was identified in the get method where floating-point multiplication followed by integer truncation could lead to off-by-one errors in indexing.
This commit optimizes the `DataProtoFuture` by replacing the brittle `collect_fn` and `dispatch_fn` mechanisms with robust, native chunking logic associating futures with `start_fraction` and `end_fraction`. Key changes: - Removed obsolete `collect_fn` and `dispatch_fn` properties from `DataProtoFuture`. - Added properties `start_fraction` and `end_fraction` to track chunking offsets in start and end futures directly. - Implemented fractional range calculations in `chunk` to accurately track boundaries across multiple chunks. - Streamlined the `get` method to materialize data efficiently using integer offsets, avoiding overhead of fetching unnecessary futures - Updated `tests/test_protocol_on_cpu.py` to cover the new fractional splitting logic. Co-authored-by: Antigravity Signed-off-by: Run Yu <yurun00@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
This PR optimizes the
DataProtoFutureby replacing the brittlecollect_fnanddispatch_fnmechanisms with robust, native chunking logic associating futures withstart_fractionandend_fraction.Key changes:
collect_fnanddispatch_fnproperties fromDataProtoFuture.start_fractionandend_fractionto track chunking offsets in start and end futures directly.chunkto accurately track boundaries across multiple chunks.getmethod to materialize data efficiently using integer offsets, avoiding overhead of fetching unnecessary futurestests/test_protocol_on_cpu.pyto cover the new fractional splitting logic.Co-authored-by: Antigravity
Checklist Before Starting
is:pr is:open DataProtoFuture[{modules}] {type}: {description}(This will be checked by the CI){modules}includesingle_controller{type}isrefactorTest
Ran the updated CPU unit tests locally to validate the new fractional splitting logic:
cpu unit test: https://github.com/yurun00/verl/actions/runs/25240216728
sgl: https://github.com/yurun00/verl/actions/runs/25240216710
vllm: https://github.com/yurun00/verl/actions/runs/25240216727
API and Usage Example
No API change
Design & Code Changes
As shown in key changes above
Checklist Before Submitting
Important
Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=alwaysci-requestchannel in theverlSlack workspace. (If not accessible, please try the Feishu group (飞书群).)recipesubmodule, please also update the reference to the submodule commit viagit submodule update --remoteorcd recipe && git pull origin main.